# English speech processing
Phoneme Scorer V2 Wav2vec2
Apache-2.0
An automatic speech recognition model based on Wav2Vec2-Base architecture, specifically fine-tuned for phoneme recognition on the LJSpeech Phonemes dataset
Speech Recognition
Transformers English

P
ct-vikramanantha
167
9
Wav2vec2 Large Lv60 Phoneme Timit English Timit 4k
Apache-2.0
English phoneme recognition model fine-tuned from facebook/wav2vec2-large-lv60, achieving a phoneme error rate of 10.53% on the TIMIT dataset
Speech Recognition
Transformers English

W
excalibur12
306
3
Wav2vec2 Ljspeech Gruut
Apache-2.0
A phoneme recognition model based on the Wav2Vec2 architecture, fine-tuned on the LJSpeech Phonemes dataset, used to convert speech into phoneme sequences
Speech Recognition
Transformers English

W
bookbot
2,484
17
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base and trained in the Google Colab environment.
Speech Recognition
Transformers

W
pannaga
16
0
Wav2vec2 Conformer Rope Large 960h Ft
Apache-2.0
This model incorporates rotary position embedding technology, is pre-trained and fine-tuned on 960 hours of LibriSpeech data sampled at 16kHz, and is suitable for English speech recognition tasks.
Speech Recognition
Transformers English

W
facebook
22.02k
10
Wav2vec2 Conformer Rel Pos Large 960h Ft
Apache-2.0
A Wav2Vec2-Conformer model based on 16kHz sampled speech audio, using relative positional embedding technology, pre-trained and fine-tuned on 960 hours of Librispeech data
Speech Recognition
Transformers English

W
facebook
1,038
5
Stt En Conformer Ctc Large
This is a large automatic speech recognition (ASR) model based on the Conformer architecture, supporting English speech transcription and trained using CTC loss function.
Speech Recognition English
S
nvidia
3,740
24
Wav2vec2 Large 960h Lv60
Apache-2.0
Wav2Vec2 is a powerful speech recognition model that extracts features from raw audio through self-supervised learning and achieves high-performance speech recognition with limited labeled data.
Speech Recognition English
W
facebook
7,011
6
Hubert Large Ls960 Ft
Apache-2.0
HuBERT-Large is a self-supervised speech representation learning model fine-tuned on 960 hours of LibriSpeech data for automatic speech recognition tasks.
Speech Recognition
Transformers English

H
facebook
776.27k
66
Hubert Xlarge Ls960 Ft
Apache-2.0
A fine-tuned HuBERT extra-large speech recognition model based on 960 hours of Librispeech data, achieving a WER of only 1.8 on the LibriSpeech test set
Speech Recognition
Transformers English

H
facebook
8,160
14
Data2vec Audio Base 960h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language processing. This model is a speech recognition model pre-trained and fine-tuned on 960 hours of LibriSpeech audio data.
Speech Recognition
Transformers English

D
facebook
10.61k
12
Librispeech 100h Supervised
Apache-2.0
This model is a speech recognition model fine-tuned on the LibriSpeech 100-hour dataset based on facebook/wav2vec2-large-lv60, achieving a low word error rate.
Speech Recognition
Transformers

L
Kuray107
14
0
Featured Recommended AI Models